Trojan Source is a software vulnerability that abuses Unicode's bidirectional characters to display source code differently than the actual execution of the source code. The exploit utilizes how writing scripts of different reading directions are displayed and encoded on computers. It was discovered by Nicholas Boucher and Ross Anderson at Cambridge University in late 2021.
+Relevant Unicode bidirectional formatting characters !Abbreviation !Name !Description | ||
LRE | Try treating following text as left-to-right. | |
RLE | Try treating following text as right-to-left. | |
LRO | Force treating following text as left-to-right. | |
RLO | Force treating following text as right-to-left. | |
LRI | Force treating following text as left-to-right without affecting adjacent text. | |
RLI | Force treating following text as right-to-left without affecting adjacent text. | |
FSI | Force treating following text in direction indicated by the next character. | |
Terminate nearest LRE, RLE, LRO, or RLO. | ||
PDI | Terminate nearest LRI or RLI. |
+Vulnerable Python code !Source code with hints !Source code displayed visually !Source code interpreted | ||
'''Add num1 and num2, and [RLI] ''' ;return return num1 + num2 | '''Add num1 and num2, and return; ''' return num1 + num2 | '''Add num1 and num2, and ''' ; return return num1 + num2 |
In the above example, the RLI mark (right-to-left isolate) forces the following text to be interpreted differently than it is displayed: the triple-quote is first (ending the string), followed by a semicolon (starting a new line), and finally with the premature return (returning and ignoring any code below it). The new line terminates the RLI mark, preventing it from flowing into the below code. Because of the Bidi character, some source code editors and IDEs rearrange the code for display without any visual indication that the code has been rearranged, so a human code reviewer would not normally detect them. However, when the code is inserted into a compiler, the compiler may ignore the Bidi character and process the characters in a different order than visually displayed. When the compiler is finished, it could potentially execute code that visually appeared to be non-executable. Formatting marks can be combined multiple times to create complex attacks.
While the attack is not strictly an error, many compilers, interpreters, and websites added warnings or mitigations for the exploit. Both GNU GCC and LLVM received requests to deal with the exploit. Marek Polacek submitted a patch to GCC shortly after the exploit was published that implemented a warning for potentially unsafe directional characters; this functionality was merged for GCC 12 under the -Wbidi-chars flag. LLVM also merged similar patches. Rust fixed the exploit in 1.56.1, rejecting code that includes the characters by default. The developers of Rust found no vulnerable packages prior to the fix.
Many source code editors and IDEs now make these potentially unsafe characters more visible. Visual Studio Code now renders control characters by default. Notepad++ and vim already made these characters more visible, as noted in the research paper.
Red Hat issued an advisory on their website, labeling the exploit as "moderate". GitHub released a warning on their blog, as well as updating the website to show a dialog box when Bidi characters are detected in a repository's code.
|
|